Skip to content

Conversation

@MrBurmark
Copy link
Member

Summary

Fix the cuda and hip matrix tutorial. Fix spacing, add proper synchronization, map threads properly in teams implementation.

@MrBurmark MrBurmark requested review from a team and artv3 September 21, 2025 17:24
@artv3
Copy link
Member

artv3 commented Sep 21, 2025

I think we also need to clean up the description of the problem to highlight the transpose of the indices as well as the CPU implementations to make it consistent with the gpu versions.

@artv3
Copy link
Member

artv3 commented Sep 21, 2025

Thanks for getting this started @MrBurmark. I took a follow up pass; but I think I could use a second set of eyes on the Kernel implementation. Would anyone in @LLNL/raja-core be able to take a look?

RAJA::loop_icount<hip_threads_x>(ctx, row_tile, [&] (int col, int tx) {

d_Atview(col, row) = Tile_Array[ty][tx];
d_Atview(row, col) = Tile_Array[tx][ty];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrBurmark , I switched it around so its clear that the x and y threads have been transposed in shared memory. I'm not too sure how to express that in Kernel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more clear to call row and col here rowt and colt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

@rhornung67 rhornung67 added this to the Dec 2025 Release milestone Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Matrix Transpose Tutorial Cleanup

4 participants